STATS 32 Session 5: Data Analysis Projects

Kenneth Tay

Oct 16, 2018

Recap of week 2

Agenda for today

“Official” cheat sheet for readr available here.

Function syntax

The most important syntax in R is the function call. All R syntax has function calls underlying it.

function_name(<inputs to the function>,
              <arguments which change 
              how the function operates>)
x <- c(-5, -3, -1, 1, 3, NA)
mean(x, na.rm = TRUE)
## [1] -1

Function calls read “inside out”

abs(x): If x is positive, return x. If x is negative, return x without the negative sign.

mean(abs(x), na.rm = TRUE)
## [1] 2.6

Function calls read “inside out”

abs(x): If x is positive, return x. If x is negative, return x without the negative sign.

mean(abs(x), na.rm = TRUE)
## [1] 2.6

%>% syntax with dplyr

Take the mtcars dataset, select just the wt and mpg columns, then select rows with mpg < 15

mtcars %>% 
    select(wt, mpg) %>% 
    filter(mpg < 15)

+ syntax with ggplot2

library(ggplot2)
ggplot(data = mtcars, mapping = aes(x = wt, y = hp)) +
    geom_point() +
    labs(title = "Horsepower vs. Weight", x = "Weight", 
         y = "Horsepower") +
    theme_classic()

Scripts in R

Working directories in R

Projects in R

Today’s dataset: Drought in California

Data source: United States Drought Monitor (USDM)

USDM: data download

USDM: data selection

The data in Excel









Optional material

Different packages for working with different data formats

USDM: data selection details

tidyr functions: gather and spread

gather: Used when some column names are not variables, but values of a variable

(Source: R for Data Science)

spread: Opposite of gather

(Source: R for Data Science)

tidyr functions: separate and unite

separate: Used to separate values in one column into multiple columns

(Source: R for Data Science)

unite: Opposite of separate

(Source: R for Data Science)